Conversation
|
|
||
| /** Google ad-serving domains whose URLs should be proxied (exact match). */ | ||
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
General approach: whenever we turn a hostname string into a regular expression, we should escape all regex metacharacters, not just dots, so that the regex always represents a literal hostname match. This prevents any accidental regex behavior if hostname strings change in the future.
Best minimal fix here: introduce a small helper that takes a hostname string and returns its fully regex-escaped form, then use that helper instead of domain.replace(/\./g, '\\.') when constructing the RegExp in the rewriteUrl fallback. This keeps existing behavior for the current hostnames while making the code robust against future changes, and directly addresses the CodeQL concern that the domain strings are used as regex patterns.
Concrete changes in crates/js/lib/src/integrations/gpt/script_guard.ts:
- Add a helper function, e.g.
escapeHostnameForRegex(hostname: string): string, that replaces all regex metacharacters ([\\^$.*+?()[]{}|]) with escaped versions. - In the loop within the
catchblock ofrewriteUrl, replace the inlinedomain.replace(/\./g, '\\.')call withescapeHostnameForRegex(domain). - Keep all existing logic and data structures unchanged; we’re only improving the regex-construction step.
No external libraries are needed; we can implement the escape helper with a simple .replace call.
| @@ -51,6 +51,15 @@ | ||
| /** Regex to match wildcard `*.safeframe.googlesyndication.com` subdomains. */ | ||
| const SAFEFRAME_WILDCARD_RE = /[a-zA-Z0-9_-]+\.safeframe\.googlesyndication\.com/i; | ||
|
|
||
| /** | ||
| * Escape a hostname string so it can be safely interpolated into a RegExp | ||
| * as a literal. This prevents regex metacharacters from changing the match | ||
| * semantics if domain strings ever contain them. | ||
| */ | ||
| function escapeHostnameForRegex(hostname: string): string { | ||
| return hostname.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); | ||
| } | ||
|
|
||
| /** Integration route prefix on the first-party domain. */ | ||
| const PROXY_PREFIX = '/integrations/gpt'; | ||
|
|
||
| @@ -97,8 +106,9 @@ | ||
| for (const domain of GPT_DOMAINS) { | ||
| if (lower.includes(domain)) { | ||
| const prefix = hostPrefixForDomain(domain); | ||
| const escapedDomain = escapeHostnameForRegex(domain); | ||
| return originalUrl.replace( | ||
| new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'), | ||
| new RegExp(`https?://(?:www\\.)?${escapedDomain}`, 'i'), | ||
| `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`, | ||
| ); | ||
| } |
| /** Google ad-serving domains whose URLs should be proxied (exact match). */ | ||
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', | ||
| 'pagead2.googlesyndication.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 3 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', | ||
| 'pagead2.googlesyndication.com', | ||
| 'tpc.googlesyndication.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 13 days ago
In general, to avoid incomplete hostname regular expressions, any string used to build a regex should be run through a generic “escape for regex literal” function that escapes all regex metacharacters, not only dots. This ensures future additions to GPT_DOMAINS cannot accidentally introduce patterns that match more than the literal hostname.
Concretely, in crates/js/lib/src/integrations/gpt/script_guard.ts, the fallback block in rewriteUrl currently builds the regex with:
new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i')This manually escapes only dots in domain. Replace this with a helper escapeRegex (defined in this file) that escapes every regex metacharacter: \ ^ $ * + ? . ( ) | { } [ ]. Use that helper both for domain and for any other future regex constructions based on literal strings if needed. The change is localized to this file: add the helper function (near the top, after constants or helpers) and change the new RegExp(...) call to use escapeRegex(domain) instead of domain.replace(/\./g, '\\.'). No behavior changes for current values, but it becomes robust and satisfies the security rule.
| @@ -54,6 +54,13 @@ | ||
| /** Integration route prefix on the first-party domain. */ | ||
| const PROXY_PREFIX = '/integrations/gpt'; | ||
|
|
||
| /** | ||
| * Escape a string so it can be safely used inside a RegExp literal. | ||
| */ | ||
| function escapeRegex(value: string): string { | ||
| return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); | ||
| } | ||
|
|
||
| // --------------------------------------------------------------------------- | ||
| // URL matching and rewriting | ||
| // --------------------------------------------------------------------------- | ||
| @@ -98,7 +105,7 @@ | ||
| if (lower.includes(domain)) { | ||
| const prefix = hostPrefixForDomain(domain); | ||
| return originalUrl.replace( | ||
| new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'), | ||
| new RegExp(`https?://(?:www\\.)?${escapeRegex(domain)}`, 'i'), | ||
| `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`, | ||
| ); | ||
| } |
| 'pagead2.googlesyndication.com', | ||
| 'tpc.googlesyndication.com', | ||
| 'googletagservices.com', | ||
| 'www.googletagservices.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 3 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| 'tpc.googlesyndication.com', | ||
| 'googletagservices.com', | ||
| 'www.googletagservices.com', | ||
| 'cm.g.doubleclick.net', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 3 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| 'cm.g.doubleclick.net', | ||
| 'ep1.adtrafficquality.google', | ||
| 'ep2.adtrafficquality.google', | ||
| 'www.googleadservices.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 3 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
b6ec286 to
b811668
Compare
|
We should consider the scope of this integration. Currently this is doing a lot of rewriting and proxying it's not catching everything but many scripts and 3rd party calls are proxied through the 1st party context. |
a800ad7 to
264f64f
Compare
264f64f to
597bef9
Compare
|
@ChristianPavilonis Can you please make changes or resolve comment for the previous review? |
|
Made some changes @aram356 |
d03f6aa to
295b166
Compare
There was a problem hiding this comment.
Summary
👍 Solid GPT first-party proxy integration. The Rust code follows existing integration patterns well (error-stack, log::, Arc, IntegrationRegistration::builder). The client-side script guard is thorough with six interception layers. Two documentation accuracy issues need fixing before merge.
Findings
🔧 High (P1)
X-Script-Sourceheader claimed but not implemented:docs/guide/integrations/gpt.md:76claims proxy responses include anX-Script-Sourceheader, butgpt.rsnever sets it. Either add the header or fix the docs. (inline comment)- Wrong GitHub org in doc links:
docs/guide/integrations/gpt.md:150-151links toAnomalyCo/trusted-serverinstead ofIABTechLab/trusted-server. (inline comment)
🤔 Medium (P2)
- Awkward module doc line break:
gpt.rs:13-15splits a sentence mid-phrase, creating an orphan line in rendered docs. (inline comment) - Unverified doc links:
docs/guide/integrations/gpt.md:155-158references/guide/integrations-overview,/guide/configuration,/guide/first-party-proxy, and/guide/integrations/gam— verify these pages exist or remove the links.
⛏ Low (P3)
- Inconsistent dash style:
gpt.mduses ASCII--throughout while the rest of the project uses em-dash (—).
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- CodeQL: PASS
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Solid GPT first-party proxy integration with broad Rust/TS coverage and green CI. I found one high-severity runtime issue and one medium-severity test gap that should be addressed before merge.
Findings
🔧 High (P1)
- GPT shim auto-init can be skipped due to injection order (
crates/js/lib/src/integrations/gpt/index.ts:168)
Thewindow.__tsjs_gpt_enabledcheck runs at module evaluation time, but the enable flag is injected after the unified TSJS bundle in the HTML processor. This can leave the shim uninstalled on pages that rely on auto-init.
🤔 Medium (P2)
- Disabled gating test cannot detect auto-init regressions (
crates/js/lib/test/integrations/gpt/index.test.ts:202)
The test resets guard state immediately after import before asserting, so it passes even if auto-init had run during import.
CI Status
- Analyze (actions): PASS
- Analyze (javascript-typescript): PASS
- Analyze (rust): PASS
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-docs: PASS
- format-typescript: PASS
- CodeQL: PASS
The server-injected inline script now calls __tsjs_installGptShim() directly instead of relying on a module-scope auto-init check that raced against the enable flag. This ensures the shim installs regardless of script injection order. Also fixes the vacuous disabled-gating test that reset state before asserting, making it always pass.
What this does
Proxies gpt.js script and additional scripts loaded by gpt.js that do the heavy lifting for ad calls for google.
Change Summary
Closes #227
Verifying GPT Script Proxying in the Browser
To confirm that GPT scripts are being proxied through the trusted server domain: